Improve histogram consistency #161

dolik-rce · 2023-08-24T10:49:07Z

Hi, I have noticed, that some of our tooling has problems with histograms scraped from nginx.

The problem seems to be, that one worker processes collect() call, while another one is in the middle of updating histogram counters. This might result in situation, where the data contain higher count for some le values then for le="+Inf".

Such inconsistency breaks all kinds of assumption that various tools might have. For us it has manifested by prometheus thinking that some counters produced by recorded rules were restarted (because {le="+Inf"} - {le="0.1"} was negative), which resulted in huge jumps in the metrics.

The proposed fix is rather simple, just set the infinity value before other buckets (which are already incremented in descending order). It actually still allows to return inconsistent values, but at least the common assumption of non-decreasing bucket values can't be broken.

Proper fix would probably require locking, as there is AFAIK no way to atomically increment multiple values in the shared dict.

knyar · 2023-08-24T13:09:27Z

Thank you! Do you want me to cut a new release that will include this change?

dolik-rce · 2023-08-24T13:37:55Z

We have found a workaround that avoids using histograms, which solves the problem completely. So we are not in a hurry, release can wait until there is something more important.

BTW: It might be good idea to mention this problem with histograms atomicity in the documentation...

improve histogram consistency

bb56e52

knyar merged commit 0d790d0 into knyar:main Aug 24, 2023
3 checks passed

dolik-rce deleted the improve-histogram-consistency branch August 24, 2023 13:38

knyar mentioned this pull request Jan 14, 2024

why is there no del method for histogram types #165

Closed

knyar added a commit that referenced this pull request May 25, 2024

Document histogram consistency limitations (#161)

d51ef8f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve histogram consistency #161

Improve histogram consistency #161

dolik-rce commented Aug 24, 2023

knyar commented Aug 24, 2023

dolik-rce commented Aug 24, 2023

Improve histogram consistency #161

Improve histogram consistency #161

Conversation

dolik-rce commented Aug 24, 2023

knyar commented Aug 24, 2023

dolik-rce commented Aug 24, 2023